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ABSTRACT 



This article presents the results of a pilot longitudinal 
study that attempted to develop a method to take subjective, qualitative 
observations about the English language speaking skills of Japanese English 
language learners and transform them into objective, quantitative measures. 
The following considerations must be addressed in the course of constructing 
this measure: what are the appropriate expectations of proficiency of a given 
student? Which skills should be mastered, to what level, and in what order? 
Are the instruments used valid measures, encompassing all the proper 
variables? Are the raters evaluating the students consistently? How can 
results be compared from speech to speech, class to class, year to year? 

Three types of tests are used: monologue speaking test (presentation) ; a 
dialogue speaking test (interview) ; and a multilogue speaking test 
(discussion and debate) . (KFT) 
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1 . Theoretical background and rationale 

Communication skills are a highly desired aspect in today’s job market and the 
increasingly rapid changes in the workplace make management aware of the importance 
of competent communicators (Tatum 1998) . It therefore follows that as business grows 
on an increasingly global level, students are in need of English oral communication skills 
as they graduate from university if they are to be competitive in the job market. 
Communication classes are now firmly entrenched in universities that teach English as 
a second or foreign language. However, many students are still graduating with little 
more than elementary “survival English” skills. As language teachers, it is crucial that 
we enhance students’ delivery skills, increase students’ confidence, and develop students’ 
methods of organization and critical thinking skills. As language testers, it is necessary 
for us to establish a careful research design and conduct a precise measurement to 
determine if these goals have been met. The oral communication field needs a clear-cut 
method of evaluation as can be found in discrete language skill classes such as listening 
comprehension. Language teachers and language testers need a method which takes 
subjective, qualitative observations, and transforms them into objective, quantitative 
measures. What we will present is a discussion of our on-going pilot project which is 
attempting to reach these goals. 

2 . Purpose of the research 

This paper will present the results of an in-progress pilot study concerned with the 
previously mentioned assumptions and address the following considerations : 
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1 . What are appropriate expectations of proficiency of an undergraduate student 
or a graduating university student? 

2 . Which skills should be mastered, to what level, and in what order? 

3 . Are evaluation instruments used sound in that they cover the range of the 
variables, all of the items fit, and all the items measure what they are intended 
to measure? 

4 . Are the raters evaluating students consistently? 

5 . How can the results be compared from speech to speech, class to class, or year 

to year? 

In addressing the question of what are the appropriate expectations of proficiency 
of university students, our initial assumptions are based on a study done with Japanese 
graduate students at Keio University. In the study where graduate students were asked 
to cite what they felt were the most important/useful English skills for them to learn, the 
resulting list was conversation, presentation, discussion, and debate. (Hiyoshi Review, 
2000) . These, in turn have been the areas we decided to focus on. 

To what degree our evaluation instruments are sound in covering the range of the 
variables, fit all of the items to be measured, and measure what we intended them to 
measure will be determined at the end of the term. The university at which this study is 
being conducted runs on a year-long course system and students are half way through the 
school year at the time of writing this paper. 

This is a two-year, longitudinal study, so we have been able to compare the students 
performance from both last year and this year. The students had a non-native speaking 
English teacher for the first year and are currently with a native English speaker for the 
second year of this project. 

3 . Research design and methods 

3.1. Three types of speaking tests 

1 ) Monologue speaking test (presentation) 

• show and ^//-Students are allowed to talk about anjd;hing of their choosing. 
This activity focuses on giving students one of their first opportunities to 
make a small presentation in English, so is short in time and varied in topics. 
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• truth/ lie stor>/-Students tell stories. Other students in the class have to decide 
when they are telling the truth and when they are lying. 

• class presentation-students talk about their university majors and seminars. 
They are expected to go into more detail than with the show and tell 
activity and use more techniques generally associated with proficient presen- 
tation skills. 

2 ) Dialogue speaking test (interview) -This is an open-ended, student lead 
-discussion with the teacher. As a “real” conversation is not rehearsed or 
written in advanced, either is this test. Students are told in advance that they 
will be required to use the conversation skills they have learned throughout 
the course to lead a one-to-one conversation with the teacher. Each individual 
student is in charge of choosing the topic and regulating the flow of the 
conversation. Because of this, issues of background knowledge, etc. are not 
considered an issue. The conversation lasts for approximately ten minutes. 

3 ) Multilogue speaking test (discussing and debating) -The discussions are stu- 
dent-generated discussions. Students are put into groups, and as a group, 
students decide on a topic they feel would be of interest for the rest of the 
classroom. Next, students prepare two sets of questions. One set is a list of 
ten multiple-choice questions based on the topic their group has chosen. The 
other is a list of five questions to guide the group discussions that will follow. 
After this, students are put into new groups, so one member from each of the 
original groups is in each of the newly established groups. Taking turns, each 
student is then put in charge of leading their new group in a discussion after 
the other members in their discussion group have completed the multiple 
-choice questionnaire. Each student is, in turn, the group leader for one 90 
minute class that focuses on group conversations based on the topic the 
original groups chose. The following is a chart to help illustrate this proce- 
dure : 
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Group A (four students) Group C (four students) 

Group B (four students) Group D (four students) 

Each group decides on a discussion topic and then writes both multiple 
-choice questions and group discussion questions. 

New groups are then made up from one member from each of the 
original groups : 

ABCD ABCD ABCD ABCD 

In turn, each member runs a 70 minute conversation based on the topic 
chosen by the original groups. 



The final evaluation will be done after a unit on how to debate has been completed. 
At this time, students ability to debate in English will be evaluated. This will be done in 
the last semester of the second academic year. 

3.2. Procedure of the research 

1 ) Subjects 

Twenty-six Japanese college students majoring in Business Administration and 
Economics. The numbers are made up of approximately 70 % male and 30% 
female students. 

2 ) Raters 

Two classroom teachers (M I a native speaker of English, Y I a non-native 
speaker of English) 

3 ) Rating items and criterion 
Evaluation Items : 

Presentations I 

• content 

• language 

• eye contact 
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Interviews I 






‘ • comprehensibility 

• pronunciation 

• fluency 

• ability to explain an idea 
Discussing and Debating : 

• able to be part of the conversation to help it flow naturally 
(including times other than when asked a direct question) 

• uses fillers/ additional questions to include others in conversation 

• transfers skills used in dialogues to group discussions 

4 ) Rating procedure 

The presentation test was rated by the non-native speaker of English, and the 
interview test and the discussion (and/or debating) test were rated by the 
native speaker of English. 

5 ) Rating scale 

The rating scale used in the analysis was a four-point scale as follows I 
12 3 4 

poor good 

4 . Data analyses 

Following data analyses will be conducted to answer the research questions 
mentioned above. 

1 . Inter-rater reliability of the two raters 

2 . T-test analysis between the mean scores of two raters 

3. Descriptive statistics of three tests (Task difficulty) 

4 . Item difficulty (10 items) 

5 . Internal consistency of 10 items 

6 . Factor analysis 

7 . Correlations between raters and items 
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' 5 . Results and discussion 

5.1. Inter-rater reliability 



Table 1 Inter-rater reliability between the two teachers 





N 


correlation 


sig. 


M overall and Y 
overall 


26 


.746 


.001 



Since all three tests were not rated by the same teacher, as mentioned in the 
procedure above, we first look at the inter-rater reliability to examine the rating 
consistency between the two teachers by using the overall evaluation of 26 students each. 

Table 1 shows the inter-rater reliability between the two teachers, which is reason- 
ably high and acceptable for an oral assessment. Although this correlation is calculated 
based on the overall evaluation of each student (not by individual items) of the both 
teachers, it is reasonable to grasp the rating tendency of the two raters. Also, the 
coefficient ( . 746) , which is acceptable as an oral proficiency assessment, will enable us 
to count on the evaluation of the following three tests by the two teachers. 

5.2. The t-test results of the mean scores between the two teachers 



Table 2 The t-test results (by the mean difference) between the two teachers 
using the overall evaluation 





M 


N 


SD 


t . 


df 


sig 


Native 


2.77 


26 


.71 


-1.995 


25 


.057 


Non- Native 


2.99 


26 


.66 









In order to check if there is a significant difference of the rating (harshness or 
leniency, etc.) between the two teachers, we will use the t-test analysis so that we can 
statistically investigate the mean difference between the two raters. 

Table 2 suggests that there is no significant difference (no significant at the .05 
level of significance) of the mean scores of the two raters. 

Table 1 and Table 2 confirm that two classroom teachers rate consistently with each 
other and that there is no significant difference in their ratings in terms of harshness or 
leniency. 
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5.3, ^Descriptive statistics of three tests 



Table 3 Descriptive statistics of three tests 





Presentation 


Interview 


Discussion 


M 


3.21 


2.76 


2.41 


SD 


.70 


.63 


.69 


Mini 


1 


1.5 


1 


Max 


4 


4 


3.67 


N 


26 


26 


26 



Table 3 demonstrates that the discussion test is the most difficult task for students, 
followed by the interview test and the presentation test. One possible explanation for this 
is that the presentation test which can be prepared in advance seems easier than the 
other two, while the discussion test which requires students* complicated interactive 
ability is the most difficult one. 

5.4. Item Difficulty 



Table 4 Item difficulty (10 rating items) 



Easy Difficult 



Item 


Pen 


Pin 


Ipr 


Pey 


Dtr 


Icm 


Ifl 


lab 


Dhl 


Duse 


M 


3.54 


3.27 


3.08 


2.85 


2.77 


2.69 


2.65 


2.65 


2.35 


2.12 


SD 


.76 


.83 


.63 


.88 


.65 


.74 


.80 


.75 


.80 


.91 



N.B. 

Symbols used in the table I 
Presentations I 

• content (Pen) 

• language (Pin) 

• eye contact (Pey) 

Interviews ’ 

• comprehensibility (Icm) 

• pronunciation (Ipr) 

• fluency (Ifl) 

• ability to explain an idea (lab) 

Discussions and Debating 

• able to be part of the conversation to help it flow naturally (including times other 
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than when asked a direct question) (Dhl) 

• uses fillers/additional questions to include others in conversation (Duse) 

• transfers skills used in dialogues to group discussions (Dtr) 

Table 4 suggests that two items in the presentation test are easiest. This is probably 
because students can prepare and improve in advance in their writing process, while 
practicing eye contact in group situations is more difficult to practice. Even pronuncia- 
tion in the interview test can be practiced individually beforehand. However, other items 
which are more related to interaction, negotiation and listening ability become more 
difficult for students. 

5.5, Internal consistency of all 10 items as a whole communication test 

Table 5 Internal consistency of 10 items as a whole communication test 
Reliability coefficient Alpha = .90 



Table 5 demonstrates the internal consistency of 10 items. The reliability coef- 
ficient .90 (Alpha) is rather high and acceptable as the internal consistency. Therefore, 

we can claim that these 10 items of three tests are measuring the students general 
conununication ability consistently. 



5.6. Factor analysis 



Table 6 Results of factor analysis 





Component 




1) 


2) 


Prcontent 


- 


.887 


Prlangnage 


— 


.925 


Preyecon 


.459 


.648 


Incompre 


.923 


— 


Inpronoun 


.764 


.389 


Influency 


.899 


— 


Inability 


.762 


— 


Dihelp 


.851 




Diuse 


.807 


. 346 


Ditrans 


.671 


.308 


% of variance 


48.831 


25.406 


N.B. The blank spacl 


means 


that factor 



loadings are below .300 
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' Table 6 shows the results of factor analysis and it suggests two factors. Although we 
started with three tests (Monologue : Presentation test, Dialogue : Interview test, 
Multilogue : Discussion and Debating) , the suggested components of the whole communi- 
cation ability is two factors. One is possibly dealing with the combination of dialogue 
(interview) and multilogue (discussion), in other words, interactive and reciprocal 
ability, while the other is one-way, simple presentation ability. Since the number of 
students as well as the number of items is small, it is difficult to make a general 
statement from the results here. However, there seems to be a difference between the 
monologue type ability and the dialogue-multilogue ability. Further studies should be left 
for future research. 

5.7. Correlations between raters and items 



Table 7 Correlations between raters and items 





M overall 


Y overall 


Prcontent 


.239 


.360 


Prlanguage 


.246 


.385 


Preyecont 


.389* 


.607*** 


Incompre 


.853*** 


.631*** 


Inpronun 


.669*** 


777 ... 


Influency 


M2*** 


.732*** 


Inability 


.674*** 


.701*** 


Dihelp 


.712*** 


.633*** 


Diuse 


.724*** 


.739*** 


Ditrans 


.485* 


.535" 


N 


26 




N.B. 1 ) •’•?<. 001, 


•*p<. 01 , • 


p<.05 



2 ) M overall : a native speaker, 

Y overall : a non-native speaker 

Table 7 indicates that there is some shared information/part between the rater 
(teacher) judgment and the two tests (interview test and discussion test) . However, the 
presentation test has less shred element with the two raters. This could also support the 
result of factor analysis that the presentation test should be dealt with rather separately 
from the other two (dialogue and multilogue) tests. 
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6 . Conclusions and implications 

We can draw the following conclusions from the previous results and discussion. 

1 . Two raters are consistently evaluating students’ oral proficiency within the 
acceptable range of rater reliability 

2 . Among three tests (tasks) , the discussion test seems to be the most difficult 
followed by the interview test and the presentation test. In other words, the 
multilogue ability is the most difficult and followed by the dialogue and the 
monologue ability. Although we are not yet sure of the appropriate expecta- 
tions of the three abilities at this stage, the dialogue and multilogue abilities 
obviously should be enhanced. 

3 . Related to the previous results, among the three tests, the difficulty order 
(difficulty to easy) is Multilogue, Dialogue and Monologue. Among the 10 
items, if we consider these items separately that measure the whole communi- 
cation ability, the difficulty order (easy to difficult) is Presentation content. 
Presentation language. Interview pronunciation. Presentation eye contact. 
Discussion transfer. Interview comprehensibility. Interview fluency. Interview 
ability. Discussion help and Discussion use. As mentioned above, the items in 
Multilogue and Dialogue should be paid more attention. Especially, the last 
two difficult items (Discussion help and Discussion use) should be enhanced. 

4 . Judging from the results of the internal consistency, 10 items have functioned 
properly as individual rating items to measure students’ communicative 
ability. 

5 . One finding is that classroom teachers do not predict so much the presentation 
ability, which has become clear from the correlations. There must be some- 
thing unique in the presentation ability. Not only this result but also the two 
components suggested by the factor analysis have urged to reconsider the 
whole of communication ability. 

6 . Listening in the discussion is different from that in the interview, because the 
former requires students to pay more attention to the third speaker. In order 
improve the discussion ability, one of the important elements is to enhance 
students’ listening ability in the discussion (especially where native speakers 
are involved). Listening ability in the discussion could be improved in a 



classroom situation as a training course, by using taped conversations where 
more than three people are talking. 

7 . The presentation evaluation was conducted last year, and the interview test 
and the discussion test were administered in the middle of the current year as 
interim report grades : therefore, we can not make any clear-cut generaliza- 
tion. However, future research would make clearer idea of commumcation 
ability with more data. 
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